Back

Human Genetics and Genomics Advances

Elsevier BV

Preprints posted in the last 90 days, ranked by how well they match Human Genetics and Genomics Advances's content profile, based on 70 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Improving GWAS performance in underrepresented groups by appropriate modeling of genetics, environment, and sociocultural factors

Cataldo-Ramirez, C.; Lin, M.; McMahon, A.; Gignoux, C.; Weaver, T. D.; Henn, B. M.

2026-04-08 genetics 10.1101/2024.10.28.620716 medRxiv
Top 0.1%
16.7%
Show abstract

Genome-wide association studies (GWAS) and polygenic score (PGS) development are typically constrained by the data available in biobank repositories in which European cohorts are vastly overrepresented. Here, we increase the utility of non-European participant data within the UK Biobank (UKB) by characterizing the genetic affinities of UKB participants who self-identify as Bangladeshi, Indian, Pakistani, "White and Asian" (WA), and "Any Other Asian" (AOA), towards creating a more robust South Asian sample size for future genetic analyses. We assess the relationships between genetic structure and self-selected ethnic identities and use consistent patterns of clustering in the dataset to train a support vector machine (SVM). The SVM was utilized to reassign n = 1,853 AOA and WA participants at the subcontinental level, and increase the sample size of the UKB South Asian group by 1,381 additional participants. We further leverage these samples to assess GWAS performance and PGS development. We include environmental covariates in the height GWAS by implementing a rigorous covariate selection procedure, and compare the outputs of two GWAS models: GWASnull and GWASenv. We show that PGS performance derived from both GWAS models yield comparable prediction to PGS models developed with an order of magnitude larger training, and environmentally-adjusted PGS models reduce the sex-bias in predictive performance. In summary, we demonstrate how GWAS performance can be improved by leveraging ambiguous ethnicity codes, ancestry matched imputation panels, and including environmental covariates.

2
Inactivating PLEKHA6 Mutations Cause Idiopathic Hypogonadotropic Hypogonadism Through Impaired Kisspeptin Secretion

Topaloglu, A. K.; Plummer, L.; Su, C.-W.; Kotan, L. D.; Celmeli, G.; Simsek, E.; Zhao, Y.; Stamou, M.; Anik, A.; Döger, E.; Altıncık, S. A.; Mengen, E.; Koc, A. F.; Akkus, G.; Balasubramanian, R.; Turan, I.; Seminara, S. B.; Yuksel, B.

2026-04-13 pediatrics 10.64898/2026.04.10.26349358 medRxiv
Top 0.1%
12.8%
Show abstract

PurposeIdiopathic hypogonadotropic hypogonadism (IHH) is characterized by impaired reproductive maturation, and approximately half of all cases lack an identified genetic cause. We investigated the genetic basis of IHH in two large cohorts to identify novel disease-causing genes. MethodsWe analyzed exome and genome sequencing data from 1,822 patients with IHH from two independent cohorts. Rare variants were filtered using pedigree-informed inheritance models. PLEKHA6 expression in the postmortem human hypothalamus were tested at the mRNA and protein level. Functional studies assessed kisspeptin secretion in cell-based assays. ResultsWe identified 18 distinct PLEKHA6 variants in 24 patients from 20 unrelated families (1.3% of cohort). Variants segregated with disease under autosomal recessive and autosomal dominant (with variable penetrance) inheritance patterns. PLEKHA6 was robustly expressed in the hypothalamus and showed clear colocalization with neurokinin B, which served as the marker for the GnRH pulse generator. Functional studies demonstrated that patient variants significantly impaired kisspeptin secretion. ConclusionPLEKHA6 is a novel IHH gene and the first reported regulator of kisspeptin secretion from the kisspeptin-neurokinin B-dynorphin (KNDy) neurons, which have recently been established as the GnRH pulse generator. These findings establish impaired kisspeptin release as a new disease mechanism in IHH and highlight the critical role of neuropeptide trafficking in reproductive function.

3
Genome-Wide Significance Reconsidered: Low-Frequency Variants and Regulatory Networks in Autism

Mendes de Aquino, M.; Engchuan, W.; Thompson, S.; Zhou, X.; Safarian, N.; Chen, D. Z.; Trost, B.; Salazar, N. B.; Ma, C.; Thiruvahindrapuram, B.; Vorstman, J.; Scherer, S. W.; Breetvelt, E.

2026-02-12 genetic and genomic medicine 10.64898/2026.02.11.26346090 medRxiv
Top 0.1%
6.7%
Show abstract

Low-frequency variants (LFVs), defined by minor allele frequencies (MAF) of 1-5%, occupy the gap between common and rare variants in both frequency and effect size. The conventional genome-wide association study (GWAS) significance threshold (5x10-) is overly conservative for LFVs, which account for more than 25% of variants in GWAS. This limitation may obscure meaningful associations in highly heritable yet genetically complex disorders such as autism spectrum disorder (ASD). We hypothesize that the scarcity of significant LFVs in ASD GWAS reflects statistical constraints rather than a true lack of association. To address this, we derived a MAF-specific genome-wide significance threshold using linkage disequilibrium-informed simulations applied to ASD GWAS summary statistics, identifying 2.03x10- as optimal. Applying this threshold revealed three novel LFVs mapping to zinc finger proteins (ZNF420, ZNF781) and known ASD-related genes (KMT2E, PRKDC, MCM4). Enrichment analyses suggested their function in nervous system development and gene regulation. Our findings highlight the contribution of LFVs to ASD risk and underscore the importance of frequency-aware association strategies.

4
Anti-inflammatory and pro-proliferative effects of fasudil in human trisomy 21 neural progenitor cells

Baxter, L. L.; Lee, S.; Fuentes, K.; Mosley, I.; Raymond, J.; Guedj, F.; Slonim, D.; Zhou, D.; Glotfelty, E.; Tweedie, D.; Grieg, N.; Bianchi, D.

2026-03-20 pharmacology and toxicology 10.64898/2026.03.19.712922 medRxiv
Top 0.1%
6.4%
Show abstract

Down syndrome (DS) results from trisomy for human chromosome 21 and is the most frequent genetic cause of intellectual disability. No effective treatments currently exist that improve neurodevelopment and cognition. Atypical brain development in individuals with DS is apparent before birth, which suggests that the optimal time to begin administration of therapies is prenatally. Human neural progenitor cell (NPC) cultures provide a tractable in vitro model system to examine the effects of trisomy 21 (T21) on neurodevelopment and to measure the effects of pharmacological interventions. Here we report the results of preclinical studies evaluating 24 candidate therapies. RNA-Seq analyses found that euploid and T21 NPCs showed different transcriptomic responses to five candidate pharmacotherapies. The Rho-associated coiled-coil kinase (ROCK) inhibitor fasudil increased proliferation of T21 NPCs, reduced expression of inflammatory pathway genes in T21 NPCs, and reduced markers of inflammation in LPS-stimulated microglia model systems. These results demonstrate that fasudil can alter multiple T21-associated abnormalities in a beneficial manner, suggesting that fasudil warrants further study as a candidate prenatal pharmacotherapy for DS.

5
Proteogenomic analysis of 5,411 plasma proteins in sickle cell disease patients

Groza, C.; Chignon, A.; Lo, K. S.; Bellegarde, V.; Bartolucci, P.; Lettre, G.

2026-04-07 genetic and genomic medicine 10.64898/2026.04.06.26350255 medRxiv
Top 0.1%
6.4%
Show abstract

There are few therapeutic options to treat patients with sickle cell disease (SCD), a blood disorder caused by mutations in the {beta}-globin gene that affects >7M individuals worldwide. Combining human genetics and high-throughput proteomics can help identify new drug targets. Here, we present results from a proteogenomic analysis of the plasma proteome in SCD patients. We measured the levels of 5,411 plasma proteins and tested their associations with common genetic variation in 343 SCD patients. After conditional analyses, we identified 560 protein quantitative trait loci (pQTL), including 58 (10%) that are novel. Many of these pQTL are not specific to SCD patients and associate with clinically relevant traits in non-SCD African Americans from the Million Veteran Program (e.g. hemoglobin concentration, triglycerides). The effect sizes of the pQTL is largely concordant between SCD and non-SCD individuals, although we found examples (e.g. APOL1, haptoglobin) with evidence of heterogeneity that suggests an interaction between the plasma proteome and the SCD genotype. Finally, we combine pQTL and genome-wide association study results for fetal hemoglobin (HbF) in a Mendelian randomization analysis to prioritize five proteins that may increase HbF production (ENPP5, LBP, NAAA, PT3X, ZP3).

6
Assessing the clinical significance of a novel rare variant in Loeys-Dietz Syndrome by combining AI-driven modelling and cell biology

Boukrout, N.; Delage, C.; Comptdaer, T.; Arondal, W.; Jemel, A.; Azabou, N.; Bousnina, M.; Mallouki, M.; Sabaouni, N.; Arbi, R.; Kchaou, S.; Ammar, H.; Hantous-Zannad, S.; Jilani, H.; Elaribi, Y.; Benjemaa, L.; Van der Hauwaert, C.; Larrue, R.; CHEOK, M.; Perrais, M.; Lefebvre, B.; Cauffiez, C.; Pottier, N.

2026-03-31 genetic and genomic medicine 10.64898/2026.03.30.26349510 medRxiv
Top 0.1%
6.2%
Show abstract

Loeys-Dietz syndrome (LDS) is an autosomal dominant connective-tissue disorder caused by genetic variants in TGF-{beta} pathway genes, most often TGFBR1/2. While pathogenic TGFBR2 genetic mutations usually cluster in the kinase domain and disrupt SMAD signalling, distinguishing with confidence those with functional impact on TGFBR2 function from rare benign genetic alterations represents one of the most important ongoing challenges for accurate genetic testing. Therefore, there is a pressing need to develop methods that can improve functional variant interpretation. Here, we describe and characterize the functional impact of a novel genetic variant in the TGFBR2 kinase domain (E431K), in a patient with the clinical diagnosis of syndromic genetic aortopathy. We assessed the structural and functional consequences of this variant using AI-driven molecular modelling and in vitro cell-based assays. A high-quality homology-based model of TGFBR2 was generated and computational mutagenesis based on the structural context and evolutionary conservation was used to forecast variant pathogenicity. Relative to wild type, the variant affects protein stability by disrupting intramolecular interactions and likely induces conformational changes that may affect kinase activity and thus TGF-{beta} signalling. This was experimentally confirmed by showing abnormal protein level and alteration of canonical TGF-{beta} pathway activation. Overall, our results establish that the E431K variant leads to aberrant TGF-{beta} signalling and confirm the diagnosis of Loeys-Dietz syndrome type 2 in this patient.

7
HIPK4 is a novel gene associated with teratozoospermia and male infertility

Koser, S. A.; Rieck, C.; Aprea, I.; Krallmann, C.; Gaikwad, A. S.; Wallmeier, J.; Tenardi-Wenge, R.; Di Persio, S.; Neuhaus, N.; Raidt, J.; Omran, H.; Laurentino, S.; Kliesch, S.; Stallmeyer, B.; Friedrich, C.; Tüttelmann, F.

2026-03-04 sexual and reproductive health 10.64898/2026.03.04.26346694 medRxiv
Top 0.1%
4.7%
Show abstract

STUDY QUESTIONAre pathogenic variants in Homeodomain-interacting protein kinase (HIPK4) associated with sperm head abnormalities causing male infertility? SUMMARY ANSWERHIPK4 is a novel candidate gene associated with sperm head defects and human male infertility. WHAT IS KNOWN ALREADYNumerous genes causing male infertility due to Multiple Morphological Abnormalities of the sperm flagella (MMAF) have been described but the genetic basis of sperm head defects is less well understood. STUDY DESIGN, SIZE, DURATIONFour infertile brothers displaying varying degrees of quantitatively and/or qualitatively impaired spermatogenesis, their parents, and their fertile brother were included in the study. Further, the Male Reproductive Genomics (MERGE) cohort comprising exome/genome sequencing data of >3,300 men was queried. PARTICIPANTS/MATERIALS, SETTING, METHODSWe performed exome sequencing in all five brothers and their parents. To characterise the sperm phenotype, standard semen analysis, immunofluorescence staining, and transmission-electron microscopy (TEM) were carried out. Further, we evaluated the impact of the HIPK4 variant in cell culture experiments using HEK293T cells. MAIN RESULTS AND THE ROLE OF CHANCEAnalysing the exome data, we could not identify a common genetic cause in all four affected brothers. However, one of the affected brothers was compound heterozygous for two loss-of-function variants in DNAH17 (c.1076_1077dup p.(Lys360*) and c.7752+2T>A p.?) associated with markedly reduced sperm motility and MMAF. The variants pathogenicity was further validated by TEM of flagellar cross-sections revealing an outer dynein arm defect and axonemal disruption. On the contrary, his three infertile brothers were homozygous for the start-loss variant c.1A>G in HIPK4. This gene is expressed during spermiogenesis and is reportedly involved in sperm head shaping in mice. Heterologous expression of (partial) HIPK4 variant cDNA elucidated the alternative use of an in frame start codon located 35 amino acids downstream, resulting in an N-terminally truncated protein p.(Met1_Glu35del). The truncated HIPK4 protein lacks parts of its kinase domain and shows reduced protein stability. In line with published mouse models, all three brothers displayed 100% abnormal sperm head morphology with variable defects. Importantly, one brother affected by HIPK4 variants fathered a child after successful intracytoplasmic sperm injection demonstrating that it is a treatment option for HIPK4-related teratozoospermia. No further men from the MERGE cohort were affected by biallelic HIPK4 variants. Taken together, HIPK4 is an autosomal-recessive candidate gene associated with sperm head defects and male infertility. LARGE SCALE DATAThe reported variants in DNAH17 and HIPK4 were submitted to ClinVar. LIMITATIONS, REASONS FOR CAUTIONIndependent replication is required to assess the phenotypic spectrum and the reproductive outcome associated with biallelic HIPK4 variants and to formally establish the gene-disease relationship for male infertility. WIDER IMPLICATIONS OF THE FINDINGSThis study raises awareness of the significant genetic heterogeneity of male infertility. The described family highlights that distinct genetic causes may underlie a seemingly similar phenotype. Exome sequencing of families is helpful to efficiently disentangle individual causes among affected family members. STUDY FUNDING/COMPETING INTEREST(S)N.N., J.R., H.O., S.L., C.F., and F.T. were supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) within the Clinical Research Unit Male Germ Cells (CRU326, project number 329621271). R.T.W., N.N., J.R., H.O., and F.T. were supported by the Federal Ministry of Research, Technology and Space (BMFTR) as part of the project ReproTrack.MS (grant 01GR2303). S.A.K. was supported by the DFG Clinician Scientist programme CareerS Munster (project number 493624047). A.S.G. was supported by the Medical Faculty Munster via an Innovative Medical Research (IMF) grant (GA-122104).

8
Cross-ancestry performance of Parkinson's disease polygenic risk scores in admixed Latin American populations

Flores-Ocampo, V.; Reyes-Perez, P.; Ogonowski, N. S.; Sevilla-Parra, G.; Diaz-Torres, S.; Leal, T. P.; Waldo, E.; Ruiz-Contreras, A. E.; Alcauter, S.; Arguello-Pascualli, P.; Mata, I. F.; Renteria, M. E.; Medina-Rivera, A.; Dennis, J. K.

2026-03-03 genetic and genomic medicine 10.64898/2026.03.02.26347226 medRxiv
Top 0.1%
4.3%
Show abstract

Parkinsons disease (PD) is a disabling neurodegenerative disorder with a substantial heritable component. Despite major advances in genome-wide association studies (GWAS), polygenic risk scores (PRS) show reduced predictive performance outside European populations, limiting equitable translation. Latin American populations represent a particularly difficult case because of their characteristic three-way admixture. We evaluated the cross-ancestry transferability of PD PRS in 1,872 PD cases and 1,443 controls of Latin American ancestry using data from the Global Parkinsons Genetics Program (GP2). PRS were constructed using summary statistics from a large European-ancestry GWAS, a moderately sized mixed-ancestry GWAS meta-analysis, and a small ancestry-matched Latin American GWAS. We benchmarked two single-ancestry approaches (PRSice-2 and SBayesRC) against two multi-ancestry methods (PRS-CSx and BridgePRS) that explicitly model cross-population genetic architecture. Across all performance metrics, SBayesRC performed best. PRS derived from large European GWAS achieved the highest effect size (odds ratio = 2.02; pseudo-R{superscript 2} = 0.031) while PRS derived from mixed ancestry GWAS meta-analysis yielded the highest discriminative ability (AUC=0.67). Our findings demonstrate that, under current sample size imbalances, well-powered European discovery GWAS outperform ancestry-matched but underpowered datasets in three-way admixed populations. Incorporating functional annotations, as implemented in SBayesRC, improves portability across ancestries. However, the full potential of multi-ancestry PRS methods will require substantially larger ancestry-matched discovery GWAS, underscoring the urgent need to expand genetic studies in underrepresented populations.

9
The prevalence of protein misfolding as a mechanism for hereditary deafness

Gogal, R. A.; Cox, G. M.; Kolbe, D. L.; Odell, A. M.; Ovel, C. E.; McCormick, K. I.; Hong, B.; Azaiez, H.; Casavant, T. L.; Smith, R. J. H.; Braun, T. A.; Schnieders, M. J.

2026-03-11 genetics 10.64898/2026.03.09.710547 medRxiv
Top 0.1%
4.3%
Show abstract

Hearing loss is the most common sensory deficit impacting [~]5% of the worlds population. The Deafness Variation Database (DVD) is a public resource of deafness variants, containing over 380,000 missense variants across 224 genes, with 303,577 classified as a variant of uncertain significance (VUS). To address the challenge of evaluating each deafness associated VUS, we evaluate a family of probabilistic frameworks to quantify the strength of computational evidence based on ACMG/AMP recommendations. First, CADD and REVEL are compared using Bayesian models parameterized using either a ClinVar 2019 dataset or labeled DVD variants. The REVEL model built using the DVD dataset demonstrates the best accuracy, sensitivity, and specificity. Incorporation of (in)tolerance to missense variation based on sorting each gene into three bins (tolerant, average, intolerant) shows that intolerant DVD genes are consistent with a higher prior probability of being pathogenic (25.7%) than average (10.7%) or tolerant (8.7%) genes. Finally, the impact of protein folding stability was incorporated using a 2D likelihood, which surpassed the simpler models while also offering a biophysical rationale for the disease mechanism. The protein folding-informed Bayesian model results in 28,866 prioritized VUSs reaching a posterior probability of pathogenicity above 98% with a false positive rate of only 0.14%. Overall, 54,752 missense variants are predicted to cause protein folding destabilization of greater than 1.0 kcal/mol, while 18,706 of the 28,886 prioritized VUS (65%) surpass this threshold. From these VUSs, we identify twelve probands where the patients genetic diagnosis is upgraded to likely pathogenic/pathogenic. We highlight two variants that cause clear structural disruption, demonstrating the impact of biophysical characterization on variant evaluation. Author SummaryWe investigate the impacts of single amino acid changes on protein structure and folding in the context of hearing loss. Hearing loss is the most common impairment of the main senses affecting nearly 5% of the worlds population. About 45% of people with hearing loss receive a diagnosis after targeted genetic testing. Here, we integrate biophysical data that quantifies the effect of a change to protein sequence on protein folding in combination with genetic data to improve our ability to identify protein amino acid changes that are likely to impact hearing. Our work leads to 12 patients receiving an upgraded diagnosis with their variant disrupting protein stability. Although the method is applied to hearing loss, it can be used for interpreting protein sequence changes in other disease contexts.

10
Identifying genetic regulations on immune cell type proportions and their impacts on autoimmune diseases

Lin, C.; Shen, J.; Sun, J.; Xie, Y.; Xu, L.; Lin, Y.; Hu, J.; Zhao, H.

2026-03-01 genetics 10.64898/2026.02.26.708418 medRxiv
Top 0.1%
3.7%
Show abstract

Genetic regulation of immune cell composition plays a crucial role in the etiology of complex diseases, yet remains poorly understood. We propose a unified analytical framework that integrates genome-wide association studies (GWAS) of cell type proportions with cell-type-wide association studies (cWAS) to systematically characterize both the genetic regulation of immune cell composition and its downstream effects on disease risk. Using single-cell RNA sequencing data from the OneK1K cohort, we conducted a GWAS of immune cell-type proportions with a depth-weighted quasi-binomial model designed for bounded, overdispersed traits. We identified 47 genome-wide significant loci influencing eight fine-labeled immune cell subtypes. Leveraging these identified genetic effects, we further imputed genetically regulated proportions (GRPs) using polygenic risk score (PRS)-based imputation and assessed their associations with complex diseases through cWAS. We identified five significant cell type-disease associations, including two with type 1 diabetes, two with Crohns disease, and one with ulcerative colitis. Together, our results demonstrate that cell type proportions observed in scRNA-seq can reveal regulatory loci and offer insights into how genetic variations regulate immune cell type proportions to affect disease risk. Although we focused on immune single-cell data, our framework is applicable to other tissues or cellular compositions as scRNA-seq datasets expand. Author SummaryGenome-wide association studies (GWASs) have uncovered many disease-associated signals, yet most lie in noncoding regions and are difficult to interpret. Mapping GWAS signals to the relevant cell types is therefore important for better understanding the biological mechanisms that drive disease. A major challenge is that observed gene expression and measured cell-type proportions can be influenced by environmental factors and disease status. In contrast, genotypes are less affected by these factors, making them more reliable for interpreting factors of diseases. Moreover, the cell-type proportions are bounded and often skewed, so standard GWAS models that rely on Gaussian assumptions may lose power. To address this, we developed a quasi-binomial approach that better matches the data and improves discovery while controlling false positives. In real data, our method identified more genetic loci associated with cell-type proportions than a traditional linear model. To further investigate how genetic variation regulates immune cell composition to influence disease risk, we integrated our results with disease GWAS summary statistics to identify immune cell types that may contribute to disease susceptibility. Together, our results link disease-associated GWAS signals to specific immune cell types and provide insights into the cellular mechanisms that may underlie these diseases.

11
RGS6 regulates Kappa Opioid Receptor-mediated antinociceptivebehaviors

Blount, A.; Sutton, L.

2026-03-06 pharmacology and toxicology 10.64898/2026.03.04.709600 medRxiv
Top 0.1%
3.7%
Show abstract

Targeting the kappa opioid receptor (KOR) system has emerged as a potential alternative to current analgesics, however, advancing the therapeutic development of KOR requires further elucidation of its intracellular signaling events and modulators. Among these intracellular modulators, Regulators of G protein signaling (RGS) proteins act as key modulators of GPCR signaling to shape nociceptive circuits and influence pain processing. Despite this, the molecular diversity of RGS proteins that shape KOR signaling and its behavioral consequences remains largely unexplored. Here we report that RGS6, a member of the R7 RGS family, is highly expressed in nociceptive areas and modulates multiple modalities of KOR-dependent anti-nociception and nocifensive behaviors. Using global single and double knockout mouse models we show that this anti-nociceptive phenotype was highly specific to RGS6 within the R7 RGS family. Further we demonstrate that the R7 RGS family displays a lack of functional redundancy in regulation of KOR signaling and behaviors. Using peripherally restricted KOR agonists, we found that KOR-RGS6 anti-nociceptive signaling displays sex differences in a site-specific manner, as females but not males displayed enhanced anti-nociceptive and blunted nocifensive behaviors. Our findings suggest that RGS6 is a highly specific modulator of KOR-dependent anti-nociceptive signaling and plays an essential role in modulating nociceptive circuits, potentially aiding in the development of novel analgesic drugs and therapeutics.

12
Identification of compounds that repress DUX4 expression in facioscapulohumeral muscular dystrophy

Chang, N.; Moore, H. P.; Himeda, C. L.; O'Brien, T. E.; Thomas, W.; Jones, T. I.; Jones, P. L.

2026-03-11 pharmacology and toxicology 10.64898/2026.03.09.710626 medRxiv
Top 0.1%
3.6%
Show abstract

Facioscapulohumeral muscular dystrophy (FSHD) is caused by epigenetic dysregulation of the disease locus, leading to pathogenic misexpression of DUX4 in skeletal muscle. Thus, most FSHD therapeutic approaches target DUX4. Our previous study identified the chromatin remodeling factor BAZ1A (bromodomain adjacent to zinc finger domain protein 1A) as a promising target for therapeutic development. Here we used an artificial intelligence-based screening pipeline to identify molecules predicted to bind the BAZ1A bromodomain, and validated hit compounds using FSHD-specific assays in FSHD myocytes. One compound, termed C06, emerged as a potent and specific repressor of DUX4 and DUX4 target gene expression. Interestingly, while C06 exhibited binding to BAZ1A in vitro, it can also inhibit multiple kinases, including p38, an upstream activator of DUX4. Despite this, at low doses C06 was an equally effective and more specific repressor of DUX4 than losmapimod, which is a robust and specific p38 inhibitor. Thus, C06 is a useful tool for potent and specific DUX4 suppression, and a viable candidate for further development. Our results highlight both the utility and limitations of AI for targeted drug discovery, and the importance of using an FSHD-specific functional screening strategy for selecting relevant candidates.

13
Transcriptome-Wide Alternative Splicing Analysis Implicates Complex Events in Bipolar Disorder

Martinez-Jimenez, M.; Garcia-Ortiz, I.; Romero-Miguel, D.; Kavanagh, T.; Marshall, L. L.; Bello Sousa, R. A.; Sanchez Alonso, S.; Alvarez Garcia, R.; Benavente Lopez, S.; Di Stasio, E.; Schofield, P. R.; Baca-Garcia, E.; Mitchell, P. B.; Cooper, A. A.; Fullerton, J. M.; Toma, C.

2026-04-21 genetic and genomic medicine 10.64898/2026.04.19.26351209 medRxiv
Top 0.1%
3.6%
Show abstract

Alternative-splicing events (ASE) increase transcriptomic variability and play key roles in biological functions. The contribution of ASE to bipolar disorder (BD) remains largely unexplored. We performed a Transcriptome-Wide Alternative-Splicing Analysis (TWASA) to identify ASEs and genes potentially involved in BD. The study comprised 635 individuals: a discovery sample (DS) of 31 individuals from eight multiplex BD families (16 BD cases; 15 unaffected relatives), and a replication sample (RS) of 604 subjects (372 BD cases; 232 controls). Sequencing was conducted on RNA from lymphoblastoid cell lines (DS) and whole blood (RS). TWASA was performed using VAST-TOOLS (VT), rMATS (RM), and MAJIQ/MOCCASIN (MCC). Gene-set association analyses of genes containing ASEs were performed across six psychiatric disorders. Novel ASE (nASE) were investigated in the DS using FRASER. Limited gene overlap was observed across TWASA tools. MCC identified 2,031 complex ASEs involving 1,508 genes, showing the strongest genetic association with BD across psychiatric phenotypes. Prioritization of MCC-identified ASE genes yielded 441 candidates, including DOCK2 as top candidate from the DS. Replication was obtained for 98 genes, five with an identical ASE, and four (RBM26, QKI, ANKRD36, and TATDN2) showing a concordant percentage-spliced-in direction with the DS. Finally, 578 nASE were identified in the DS, with no evidence of familial segregation or differences in ASE types. This first TWASA in BD reveals tool-specific variability, complex ASE for genes specifically associated with BD, and novel candidate genes for BD. Alternative transcript isoform abundance may represent a mechanism contributing to BD pathophysiology.

14
Leveraging the genetics of human face shape boosts the discovery of orofacial cleft risk loci

Herrick, N.; Goovaerts, S.; Manchel, A.; Lee, M. K.; Zhang, X.; Davies, A.; Carlson, J. C.; Leslie-Clarkson, E. J.; Lewis, S. J.; Marazita, M. L.; Cotney, J.; Claes, P.; Shaffer, J. R.; Weinberg, S. M.

2026-02-03 genetic and genomic medicine 10.64898/2026.01.30.26345139 medRxiv
Top 0.1%
3.6%
Show abstract

Several lines of evidence suggest that normal-range facial features and nonsyndromic orofacial clefts (OFCs) exhibit a shared genetic basis. Approaches designed to leverage this relationship hold the possibility of revealing new OFC risk loci by boosting discovery power. To test this idea, we applied a pleiotropy-informed GWAS method (cFDR-GWAS) with summary statistics from large, independent European GWASs of normal facial shape (n=4,680; n=3,566) and nonsyndromic cleft lip with or without cleft palate (nsCL/P, n=3,969). The cFDR approach identified 21 independent genomic loci significantly associated with nsCL/P, providing further evidence of the interconnected genetic architecture between these traits. The five original nsCL/P GWAS signals were detected and joined by nine additional loci previously implicated in other OFC association studies. The remaining seven loci represent new nsCL/P genomic regions, and three of these replicated (P < 0.05) in an independent nsCL/P cohort: ASPSCR1, MSX2, and RALYL. A relaxed 10% cFDR-GWAS threshold identified 15 more independent loci with comparable effect sizes to those detected at the strict 5% threshold, two of which replicated: FHOD3 and SMARCA2. Gene expression patterns in major cell types and spatial transcriptomics data highlighted our gene candidates roles in craniofacial development. In conclusion, the application of an empirical Bayesian strategy to draw on association signals from genetically related traits can boost the power to identify and prioritize OFC risk loci missed by agnostic gene mapping approaches. These results hold promise that the cFDR-GWAS approach may be able to enhance our understanding of the genetic architecture of other structural birth defects.

15
Dissecting the relationship between haplotypes around ATXN2 CAG repeats and the number of CAA interruptions by long-read sequencing

Lee, B. H.; Chan, J.; McMillan, C.; NYGC ALS Consortium, ; Song, Y.; Amado, D. A.; Wang, K.

2026-03-12 genetic and genomic medicine 10.64898/2026.03.11.26348169 medRxiv
Top 0.1%
3.6%
Show abstract

CAG repeat expansions in ATXN2 are implicated as risk factors for neurological diseases, including amyotrophic lateral sclerosis (ALS) when 27-33 CAG (intermediate) repeats are present. However, how haplotypes around the repeats and CAA interruptions within the repeats are associated with diseases remains poorly understood. Here, we used long-read sequencing on the Oxford Nanopore technologies (ONT) platform to simultaneously infer haplotypes around ATXN2, the number of CAG repeats, and the number of CAA interruptions. We found that haplotypes around ATXN2 and the number of interruptions show ethnicity-specific and ALS-specific distribution. Three CAA interruptions are present at low prevalence ([~]1%) in control populations in multiple ancestry groups, but high prevalence ([~]55%) in ALS individuals with intermediate repeats. Furthermore, we examined 159 individuals with ALS ([~]90% European ancestry) with intermediate ATXN2 repeats and found a unique haplotype in ALS individuals with three CAA interruptions, which can be tagged by an SNV, rs148019457. We further sequenced 41 individuals (EUR = 39) with neurological diseases with intermediate repeats by ONT, and validated that the rs148019457-G allele is only present in haplotypes with three CAA interruptions. Our study shows that 3 CAA interruptions are rare in healthy controls but are common in individuals with intermediate ATXN2 CAG repeats and neurological disorders, and that rs148019457 tags a specific haplotype with 3 CAA interruptions in individuals of European ancestry. These results have implications for the development of precision genomic medicine for neurological disorders, and the tag SNV may help identify those with interruptions from existing microarray genotyping data.

16
Deriving LD-adjusted GWAS summary statistics through linkage disequilibrium deconvolution

Nouira, A.; Favre Moiron, M.; Tournaire, M.; Verbanck, M.

2026-04-11 genetic and genomic medicine 10.64898/2026.04.10.26350574 medRxiv
Top 0.1%
3.6%
Show abstract

Genome-wide association studies (GWAS) have identified numerous genetic variants associated with complex traits. However, linkage disequilibrium (LD) confounds these associations, leading to false positives where non-causal variants appear associated because they are correlated with nearby causal variants. This is particularly the case in highly polygenic traits where the genome can be saturated in causal variants. To address this issue, we propose LDeconv a method based on truncated singular value decomposition (SVD) that adjust GWAS summary statistics without requiring individual-level genotype data. This approach accounts for LD structure, isolates causal variants in high-LD regions, and improve the reliability of effect size estimates. We assess its performance through simulations across various LD scenarios, conduct extensive sensitivity analyses, and apply them to real GWAS data from the UK Biobank. Our results demonstrate that LDeconv effectively reduces false discoveries while preserving true associations, offering a robust framework for post-GWAS analysis.

17
Constructing a Literature-Derived Database for Benchmarking Polygenic Risk Score Construction Methods with Spectral Ranking Inferences

Sebastian, C.; Yu, M.; Jin, J.

2026-03-03 genetic and genomic medicine 10.64898/2026.03.01.26347258 medRxiv
Top 0.1%
3.5%
Show abstract

Polygenic risk scores (PRSs) have emerged as a valuable tool for genetic risk prediction and stratification in human diseases. Over the past decade, extensive methodological efforts have focused on improving the predictive power of PRS, leading to the development of numerous methods for PRS construction. Benchmarking these various methods thus becomes an essential task that is crucial for guiding future PRS applications. While studies have benchmarked subsets of these methods on specific phenotypes and cohorts, the resulting evidence remains fragmented, with a lack of work that comprehensively assess the relative performance of the various PRS methods. In this study, we addressed this gap by systematically constructing a PRS method benchmarking database synthesizing published results from 2009 to 2025. We applied a spectral ranking inference framework with uncertainty quantification to rank 14 PRS methods that had been adequately compared against each other in the literature. We constructed rankings using two complementary sources: original method-development studies and applications/benchmarking studies. While the highest-ranked methods (LDpred2 and AnnoPred) and the lowest-ranked method (C+T) were consistently identified from both sources, the relative ordering of most methods showed moderate variability. We further constructed phenotype-specific rankings, providing more detailed insights into the robustness and phenotype-specific strengths of individual methods. Collectively, the overall and phenotype-specific rankings of the PRS methods, along with the curated benchmarking data from the literature, provide a dynamic and practical reference database that can continuingly be updated with emerging new PRS methods and published benchmarking results to guide future PRS applications.

18
Zfp750 prevents oral adhesions and promotes temporary epithelial fusions

Singh, S. K.; Adelizzi, E.; Heffner, C.; Curtis, S.; Duncan, K.; Awotoye, W.; Olotu, J.; Busch, T.; Adeyemo, W.; Gowans, L. J. J.; Naicker, T.; Murray, S. A.; Butali, A.; Leslie-Clarkson, E. J.; Dunnwald, M.; Cornell, R. A.

2026-02-14 genetics 10.64898/2026.02.12.705205 medRxiv
Top 0.1%
3.2%
Show abstract

The differentiation cascade that converts basal keratinocytes into suprabasal layers, including periderm, depends on the activity of transcription factors. Mutations in the genes encoding many of these transcription factors, including TP63, IRF6 and GRHL3, disrupt periderm development. Such mutations can also interfere with embryonic fusion and septation events that depend on periderm development, including palatogenesis, digit separation and the formation of temporary epithelial fusions between digits, between eyelids, and between pinnae and the scalp. ZNF750 (Zfp750 in the mouse) is a transcription factor required for keratinocyte differentiation, but whether mutations in ZNF750 contribute risk for orofacial cleft, and the role of Zfp750 in periderm development, are unknown. To address these questions we sequenced ZNF750 in 5,659 individuals including 2,125 with nonsyndromic OFC. We identify 33 rare missense variants with frequencies less than 0.1% in gnomAD. Of these, about half are predicted to be damaging with in silico tools. Collectively, these missense variants are not overtransmitted from parents to children with OFCs. Two of the variants have lower activity than the reference variant in a zebrafish embryo-based assay but no phenotype in the corresponding murine model. However, in murine embryos homozygous for a frame-shift mutation in Zfp750 (Zfp750fs) that we generated, palatal shelves are fused but intra-oral adhesions are present, a phenotype seen in murine mutants of several bonafide OFC genes. In addition, temporary epithelial fusions are absent in Zfp750fs neonates. RNA sequencing of forelimbs from Zfp750fs embryos reveals decreased expression of epidermal terminal differentiation genes, and both increased and decreased expression of distinct periderm genes. Immunofluorescence shows the consistent presence of periderm proteins within the oral adhesions in Zfp750fs/fs embryos. Together these studies suggest that while mutations in ZNF750 are not a major contributor to OFC risk, Zfp750 does contribute to periderm-dependent morphogenic events.

19
CETP alternative splicing variation impacts human traits

Gamache, I.; Legault, M.-A.; Grenier, J.-C.; Rheaume, E.; Tardif, J.-C.; Dube, M.-P.; Hussin, J.

2026-03-18 genetics 10.64898/2026.03.16.712143 medRxiv
Top 0.1%
3.2%
Show abstract

The cholesteryl ester transfer protein (CETP) is an important protein in reverse cholesterol transport and has been identified as a significant factor associated with cardiovascular disease (CVD), making it a widely studied pharmaceutical target. Three protein-coding isoforms of CETP exist, distinguished by the alternative splicing of one exon each. The isoform primarily responsible for cholesterol-related functions in the plasma is well studied, but specific functions of each isoform remain poorly understood. In this study, we demonstrate the significance of considering CETPs isoforms in analyses of human traits. Using bulk RNA-seq data from multiple tissues, we characterized the expression patterns and genetic regulation determinants of CETP transcripts. Leveraging publicly available GWAS summary statistics, we conducted multivariable Mendelian Randomisation (MVMR) to estimate the impact of variation in isoform proportions on phenotypes, highlighting the importance of CETPs isoforms in pituitary and thyroid glands. Furthermore, we uncovered tissue-specific associations between CETPs isoforms and CVD-associated phenotypes. Additionally, we observed that the epistatic interaction previously reported between CETP and ADCY9, a gene implicated in modulating a CETP modulators response, may be mediated through the regulation of alternative splicing of exon 9. Our results underscore the importance of a comprehensive understanding of CETPs isoforms, which can significantly impact both fundamental and clinical research efforts.

20
A meta-analysis of clinically ascertained lipoedema cohorts from the UK and Spain identifies overlapping susceptibility loci with the UK Biobank

Dobbins, S. E.; Forner-Cordero, I.; Amigo Moreno, R.; Southgate, L.; Hobbs, K.; Moy, R.; Adjei, M.; Muntane, G.; Vilella, E.; Martorell, L.; Gordon, K.; Ostergaard, P. E.; Pittman, A.

2026-02-12 genetic and genomic medicine 10.64898/2026.02.11.26345915 medRxiv
Top 0.1%
2.7%
Show abstract

Lipoedema is a chronic adipose tissue disorder mainly affecting women with excess subcutaneous fat deposition on the lower limbs, associated with pain and tenderness. There is often a family history of lipoedema, suggesting a genetic origin, but the contribution of genetics is not well studied. We conducted a genome-wide association study (GWAS) for this disorder in a clinically ascertained cohort from Spain and performed a meta-analysis with the UK lipoedema cohort GWAS. We then used the results of this study as a replication of the inferred UK Biobank "lipoedema phenotype" study. Whilst our meta-analysis alone did not identify any genome-wide significant associations, our clinical cohorts provide support for three loci identified through the UKBB study: the chr2q24.3 GRB14-COBLL1 locus (rs6753142, PMETA=1.64x10-6), chr6p21.1 VEGFA locus (rs4711750, PMETA=8.99x10-7) and the chr5q11.2 ANKRD55-MAP3K1 locus (rs3936510, PMETA=1.67x10-5). We identify numerous rare SNPs with strong association signals in our meta-analysis (P<1x10-6) with support in both UK and Spanish datasets, three of which also show nominal support in the UKBB (P<0.05). These findings provide a starting point towards understanding the genetic basis of clinical lipoedema and demonstrate the utility of the interplay of large-scale biobanks genetic data and clinically ascertained cohorts to elucidate the genetic architecture of lipoedema.